CMU PRF using a Comparable Corpus
نویسندگان
چکیده
We applied a PRF (Pseudo-Relevance Feedback) system, for both the monolingual task and the German(->English task. We focused on the effects of extracting a comparable corpus from the given newspaper data; our corpus doubled the average precision when used together with the provided parallel corpus. The PRF performance was lower for the queries with few relevant documents. We also examined the effects of the PRF first-step retrieval in the source language half of the parallel corpus vs. the entire document collection .
منابع مشابه
CMU Haitian Creole-English Translation System for WMT 2011
This paper describes the statistical machine translation system submitted to the WMT11 Featured Translation Task, which involves translating Haitian Creole SMS messages into English. In our experiments we try to address the issue of noise in the training data, as well as the lack of parallel training data. Spelling normalization is applied to reduce out-of-vocabulary words in the corpus. Using ...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملBaseline Acoustic Models for Brazilian Portuguese Using CMU Sphinx Tools
Advances in speech processing research rely on the availability of public resources such as corpora, statistical models and baseline systems. In contrast to languages such as English, there are few specific resources for Brazilian Portuguese. This work describes efforts aiming to decrease such gap. Baseline acoustic models for Brazilian Portuguese were built using the CMU Sphinx toolkit and pub...
متن کاملCross-Language Pseudo-Relevance Feedback Techniques for Informal Text
Previous work has shown that pseudo relevance feedback (PRF) can be effective for cross-lingual information retrieval (CLIR). This research was primarily based on corpora such as news articles that are written using relatively formal language. In this paper, we revisit the problem of CLIR with a focus on the problems that arise with informal text, such as blogs and forums. To address the proble...
متن کاملComparative study of boosting and non-boosting training for constructing ensembles of acoustic models
This paper compares the performance of Boosting and nonBoosting training algorithms in large vocabulary continuous speech recognition (LVCSR) using ensembles of acoustic models. Both algorithms demonstrated significant word error rate reduction on the CMU Communicator corpus. However, both algorithms produced comparable improvements, even though one would expect that the Boosting algorithm, whi...
متن کامل